Goto

Collaborating Authors

 output sample


Enhanced Transformer architecture for in-context learning of dynamical systems

Rufolo, Matteo, Piga, Dario, Maroni, Gabriele, Forgione, Marco

arXiv.org Artificial Intelligence

Recently introduced by some of the authors, the in-context identification paradigm aims at estimating, offline and based on synthetic data, a meta-model that describes the behavior of a whole class of systems. Once trained, this meta-model is fed with an observed input/output sequence (context) generated by a real system to predict its behavior in a zero-shot learning fashion. In this paper, we enhance the original meta-modeling framework through three key innovations: by formulating the learning task within a probabilistic framework; by managing non-contiguous context and query windows; and by adopting recurrent patching to effectively handle long context sequences. The efficacy of these modifications is demonstrated through a numerical example focusing on the Wiener-Hammerstein system class, highlighting the model's enhanced performance and scalability.


Multi-Intent Detection in User Provided Annotations for Programming by Examples Systems

Kumar, Nischal Ashok, Gupta, Nitin, Guttula, Shanmukha, Patel, Hima

arXiv.org Artificial Intelligence

In mapping enterprise applications, data mapping remains a fundamental part of integration development, but its time consuming. An increasing number of applications lack naming standards, and nested field structures further add complexity for the integration developers. Once the mapping is done, data transformation is the next challenge for the users since each application expects data to be in a certain format. Also, while building integration flow, developers need to understand the format of the source and target data field and come up with transformation program that can change data from source to target format. The problem of automatic generation of a transformation program through program synthesis paradigm from some specifications has been studied since the early days of Artificial Intelligence (AI). Programming by Example (PBE) is one such kind of technique that targets automatic inferencing of a computer program to accomplish a format or string conversion task from user-provided input and output samples. To learn the correct intent, a diverse set of samples from the user is required. However, there is a possibility that the user fails to provide a diverse set of samples. This can lead to multiple intents or ambiguity in the input and output samples. Hence, PBE systems can get confused in generating the correct intent program. In this paper, we propose a deep neural network based ambiguity prediction model, which analyzes the input-output strings and maps them to a different set of properties responsible for multiple intent. Users can analyze these properties and accordingly can provide new samples or modify existing samples which can help in building a better PBE system for mapping enterprise applications.


Multiple output samples for each input in a single-output Gaussian process

Wong, Jeremy H. M., Zhang, Huayun, Chen, Nancy F.

arXiv.org Artificial Intelligence

The standard Gaussian Process (GP) only considers a single output sample per input in the training set. Datasets for subjective tasks, such as spoken language assessment, may be annotated with output labels from multiple human raters per input. This paper proposes to generalise the GP to allow for these multiple output samples in the training set, and thus make use of available output uncertainty information. This differs from a multi-output GP, as all output samples are from the same task here. The output density function is formulated to be the joint likelihood of observing all output samples, and latent variables are not repeated to reduce computation cost. The test set predictions are inferred similarly to a standard GP, with a difference being in the optimised hyper-parameters. This is evaluated on speechocean762, showing that it allows the GP to compute a test set output distribution that is more similar to the collection of reference outputs from the multiple human raters.


Step-by-step guide on how to train GPT-2 on books using Google Colab

#artificialintelligence

We will use Google Drive to save our checkpoints (a checkpoint is our last saved trained model). Once our trained model is saved we can load it whenever we want to generate both conditional and unconditional texts. Now that you have your Google Drive connected let's create a checkpoints folder: Now let's clone the GPT-2 repository that we will use, which is forked from nnsheperd's awesome repository (which is forked from OpenAI's but with the awesome addition of train.py), I have added a conditional_model() method which will let us pass multiple sentences at once and return a dictionary with the relevant model output samples. It also lets us avoid using bash-code.


Involutive MCMC: a Unifying Framework

Neklyudov, Kirill, Welling, Max, Egorov, Evgenii, Vetrov, Dmitry

arXiv.org Machine Learning

Name & Citation Appendix Metropolis-Hastings (Hastings, 1970) B.1 Markov Chain Monte Carlo (MCMC) is a computational Mixture Proposal (Habib & Barber, 2018) B.2 approach to fundamental problems such Multiple-Try Metropolis (Liu et al., 2000) B.3 as inference, integration, optimization, and simulation. Sample-Adaptive MCMC (Zhu, 2019) B.4 The field has developed a broad spectrum Reversible-Jump MCMC (Green, 1995) B.5 of algorithms, varying in the way they are motivated, Hybrid Monte Carlo (Duane et al., 1987) B.6 the way they are applied and how efficiently RMHMC (Girolami & Calderhead, 2011) B.7 they sample. Despite all the differences, many of NeuTra (Hoffman et al., 2019) B.8 them share the same core principle, which we A-NICE-MC (Song et al., 2017) B.9 unify as the Involutive MCMC (iMCMC) framework. L2HMC (Levy et al., 2017) B.10 Building upon this, we describe a wide Persistent HMC (Horowitz, 1991) B.11 range of MCMC algorithms in terms of iMCMC, Gibbs (Geman & Geman, 1984) B.12 and formulate a number of "tricks" which one Look Ahead (Sohl-Dickstein et al., 2014) B.13 can use as design principles for developing new NRJ (Gagnon & Doucet, 2019) B.14 MCMC algorithms. Thus, iMCMC provides a Lifted MH (Turitsyn et al., 2011) B.15 unified view of many known MCMC algorithms, which facilitates the derivation of powerful extensions. Table 1: List of algorithms that we describe by the Involutive We demonstrate the latter with two MCMC framework. See their descriptions and formulations examples where we transform known reversible in terms of iMCMC in corresponding appendices.


Learning Generative Models of Structured Signals from Their Superposition Using GANs with Application to Denoising and Demixing

Soltani, Mohammadreza, Jain, Swayambhoo, Sambasivan, Abhinav

arXiv.org Machine Learning

In general the separation problem is inherently ill-posed; however, with enough structural assumption on X and N, it has been established that separation is possible. Depending on the application one might be interested in estimating only X (in this case, N is considered as the corruption), which is referred to as denoising, or in recovering both X and N which is referred to as demixing. Both demixing and denoising arise in a variety of important practical applications in the areas of signal/image processing, computer vision, machine learning, and statistics [Chen et al., 2001, Elad et al., 2005, Bobin et al., 2007, Candès et al., 2011]. Most of the existing techniques assume some prior knowledge on the structures of X and N in order to recover the desired component signal(s). Prior knowledge about the structure of X and N can only be obtained if one has access to the generative mechanism of the signals or has access to clean samples from the probability distribution defined over sets X and N . In many practical settings, neither of these may be feasible. In this paper, we consider the problem of separating constituent signals from superposed observations when clean access to samples from the distribution is not available.


Learning with Weak Supervision from Physics and Data-Driven Constraints

Ren, Hongyu (Peking University) | Stewart, Russell (Stanford University) | Song, Jiaming (Stanford University) | Kuleshov, Volodymyr (Stanford University) | Ermon, Stefano (Stanford University)

AI Magazine

In many applications of machine learning, labeled data is scarce and obtaining additional labels is expensive. We introduce a new approach to supervising learning algorithms without labels by enforcing a small number of domain-specific constraints over the algorithms’ outputs. The constraints can be provided explicitly based on prior knowledge — e.g. we may require that objects detected in videos satisfy the laws of physics — or implicitly extracted from data using a novel framework inspired by adversarial training. We demonstrate the effectiveness of constraint-based learning on a variety of tasks — including tracking, object detection, and human pose estimation — and we find that algorithms supervised with constraints achieve high accuracies with only a small amount of labels, or with no labels at all in some cases.


Raw Waveform-based Speech Enhancement by Fully Convolutional Networks

Fu, Szu-Wei, Tsao, Yu, Lu, Xugang, Kawai, Hisashi

arXiv.org Machine Learning

This study proposes a fully convolutional network (FCN) model for raw waveform-based speech enhancement. The proposed system performs speech enhancement in an end-to-end (i.e., waveform-in and waveform-out) manner, which dif-fers from most existing denoising methods that process the magnitude spectrum (e.g., log power spectrum (LPS)) only. Because the fully connected layers, which are involved in deep neural networks (DNN) and convolutional neural networks (CNN), may not accurately characterize the local information of speech signals, particularly with high frequency components, we employed fully convolutional layers to model the waveform. More specifically, FCN consists of only convolutional layers and thus the local temporal structures of speech signals can be efficiently and effectively preserved with relatively few weights. Experimental results show that DNN- and CNN-based models have limited capability to restore high frequency components of waveforms, thus leading to decreased intelligibility of enhanced speech. By contrast, the proposed FCN model can not only effectively recover the waveforms but also outperform the LPS-based DNN baseline in terms of short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ). In addition, the number of model parameters in FCN is approximately only 0.2% compared with that in both DNN and CNN.